docs(evals): cross-reference the two runners in their docstrings#13
docs(evals): cross-reference the two runners in their docstrings#13RyanAlberts wants to merge 4 commits intomainfrom
Conversation
The repo already shipped examples/sample-prd-output.md but it wasn't linked from anywhere, while the gallery still listed /prd under "Coming soon". /prd is the headline skill in the main README walkthrough, so the empty slot was the most damaging gap. Add a /prd section that links the existing sample and remove the now-filled row.
skills/prd-from-signal.md listed only 5 PRD sections inline — missing "User Experience / Flow" and "Open Questions / Risks" — even though templates/prd-template.md (referenced from step 3) defines 6, and the cross-platform claude-skills/pmstack-prd/SKILL.md already lists all 6. Make the template the canonical source, list all 6 sections in the skill, and add a note that the template wins on future drift.
./setup --help used a hard-coded sed range '2,8p' that included the first non-comment line. Replace with awk that prints the leading comment block until the first non-# line, so help can't drift again when the comment block grows or shrinks. Before: last line of help was 'set -euo pipefail' (a bash directive). After: clean usage block, no shell internals.
The repo has two scripts whose names look like cousins — bin/run-eval.py (user-facing /run-eval; grades a third-party AI target) and evals/runner.py (internal /eval-self; grades pmstack's own skills). They are not duplicates: different inputs, different invocation paths, different output formats. But the naming risks reader confusion when someone greps for "eval runner" and grabs the wrong file. Add a one-paragraph "not to be confused with X" pointer to each docstring so the distinction is visible at a glance.
There was a problem hiding this comment.
Code Review
This pull request clarifies the distinction between evaluation scripts, improves the robustness of the setup script's help output, and enhances the documentation and instructions for the /prd skill. Specifically, it adds cross-references to bin/run-eval.py and evals/runner.py, replaces a fragile sed command with awk in the setup script, and provides a detailed example for PRD generation in the README. Feedback was provided to consolidate redundant instructions in the prd-from-signal.md skill definition to ensure better clarity for the LLM.
| 1. Analyze the signal to extract the underlying user problem (the "Why") — not just restate the surface ask. | ||
| 2. Name the target audience precisely — persona, segment, JTBD. | ||
| 3. Propose a solution that addresses the root problem. | ||
| 4. Fill out the PRD using the canonical structure in [`templates/prd-template.md`](../templates/prd-template.md). All six sections are required: | ||
| 1. Problem Statement (what / who / why now) | ||
| 2. Proposed Solution (description + value proposition) | ||
| 3. Key Features (MoSCoW — specific features, not adjectives) | ||
| 4. User Experience / Flow (step-by-step + empty / loading / error / success states) | ||
| 5. Success Metrics (North Star + 2–3 supporting + 1–2 counter-metrics) | ||
| 6. Open Questions / Risks (technical, business, unresolved decisions) | ||
| 5. Save the output to `outputs/prd-[topic-slug]-[YYYY-MM-DD].md`. |
There was a problem hiding this comment.
The instructions have a slight redundancy. Item 2, 'Name the target audience precisely', is also covered within item 4.1, 'Problem Statement (what / who / why now)'. This could be confusing for the LLM processing these instructions.
To improve clarity, I suggest removing the separate top-level item for the target audience and incorporating its detail into the 'Problem Statement' sub-item.
| 1. Analyze the signal to extract the underlying user problem (the "Why") — not just restate the surface ask. | |
| 2. Name the target audience precisely — persona, segment, JTBD. | |
| 3. Propose a solution that addresses the root problem. | |
| 4. Fill out the PRD using the canonical structure in [`templates/prd-template.md`](../templates/prd-template.md). All six sections are required: | |
| 1. Problem Statement (what / who / why now) | |
| 2. Proposed Solution (description + value proposition) | |
| 3. Key Features (MoSCoW — specific features, not adjectives) | |
| 4. User Experience / Flow (step-by-step + empty / loading / error / success states) | |
| 5. Success Metrics (North Star + 2–3 supporting + 1–2 counter-metrics) | |
| 6. Open Questions / Risks (technical, business, unresolved decisions) | |
| 5. Save the output to `outputs/prd-[topic-slug]-[YYYY-MM-DD].md`. | |
| 1. Analyze the signal to extract the underlying user problem (the "Why") — not just restate the surface ask. | |
| 2. Propose a solution that addresses the root problem. | |
| 3. Fill out the PRD using the canonical structure in [`templates/prd-template.md`](../templates/prd-template.md). All six sections are required: | |
| 1. Problem Statement (what / who / why now). The "who" should be a precise target audience (persona, segment, JTBD). | |
| 2. Proposed Solution (description + value proposition) | |
| 3. Key Features (MoSCoW — specific features, not adjectives) | |
| 4. User Experience / Flow (step-by-step + empty / loading / error / success states) | |
| 5. Success Metrics (North Star + 2–3 supporting + 1–2 counter-metrics) | |
| 6. Open Questions / Risks (technical, business, unresolved decisions) | |
| 4. Save the output to `outputs/prd-[topic-slug]-[YYYY-MM-DD].md`. |
Summary
The repo has two scripts whose names look like cousins:
bin/run-eval.py— user-facing/run-eval, grades a third-party AI target described by an eval YAMLevals/runner.py— internal/eval-self, grades pmstack's own skills via theclaudeCLIThey are not duplicates (different inputs, invocation paths, output formats), but a reader greppping for "eval runner" can easily grab the wrong file. Add a one-paragraph "not to be confused with X" pointer to each docstring.
Test plan
python3 -m py_compile bin/run-eval.py evals/runner.py— both still parsehead -10each file and confirm the cross-references are clear